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8. Benefit of Prior U.S. Application(s) (35 U.S.C. 119(e), 120 or 121) 

NOTE: "In order for an application to claim the benefit of a prior filed copending national application, the prior application must 
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claimed in at least one claim of the later filed application in the manner provided by the first paragraph of 35 U.S.C. 112" 
37 CFR 1.78(a). 
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NOTE: "in addition, the prior application must be (1 ) complete as set forth in § 1 .51 , or (2) entitled to a filing date as set forth in § 
1.53(b) and include the basic filing fee set forth in § 1.16, or (3) entitled to a filing date as set froth in § 1 .53(b) and have 
paid therein the processing and retention fee set forth in § 1 .21(1) within the time set forth in § 1 .53(d)." 37 CFR 1.78(a). 

NOTE: "Any nonprovisional application claiming the benefit of one or more prior filed copending provisional applications must 
contain or be amended to contain in the first sentence of the specification following the title a reference to each such prior 
provisional application, identifying it as a provisional application, and including the provisional application number 
(consisting of the series code and serial number) and filing date." 37 CFR 1 .78(a)(4). 

NOTE: "Any nonprovisional application claiming the benefit of one or more prior filed copending nonprovisional applications or 
international applications designating the United States of America must contain or be amended to contain in the first 
sentence of the specification following the title a reference to each such prior application, identifying it by application 
number (consisting of the series code and serial number) and filing date or international application number and 
international filing date and indicating the relationship of the applications. Cross-references to other related applications 
may be made where appropriate. (See §1 .14(b))." 37 CFR 1 .78(2). 

_ Appiicant(s) hereby claim(s) the benefit of the filing date of prior U.S. Application Serial No. 
filed on . 



(a) Application History (title as originally filed and as last amended, serial number, and filing 
date of all prior applications): 

Title: 
Ser. No.: 
Filed: 

(b) Name of applicant(s) (as originally filed and as last amended), and current 
correspondence address of applicant(s): 

Name: 
Address: 



NOTE: The proper reference to a prior filed PCT application which entered the U.S. national phase is the U.S. serial number and 
the filing date of the PCT application which designated the U.S. 

NOTE: (1 ) Where the application being transmitted adds subject matter to the International Application then the filing can be as a 
continuation-in-part or (2) it is desired to do so for other reasons, then the filing can be as a continuation. 

NOTE: The deadline for entering the national phase in the U.S. for an international application was clarified in the Notice of April 
28, 1987 (1079 O.G. 32 to 46) as follows: 

"The Patent and Trademark Office considers the international application to be pending until the 22nd month from the 
priority date if the United States has been designated and no Demand for International Preliminary Examination has been 
filed prior to the expiration of the 19th month from the priority date and until the 32nd month from the priority date if a 
Demand for International Preliminary Examination which elected the United States of America has been filed prior to the 
expiration of the 19th month from the priority date, provided that a copy of the international application has been 
communicated to the Patent and Trademark Office within the 20 or 30 month period respectively. If a copy of the 
international application has not been communicated to the Patent and Trademark Office within the 20 or 30 month 
period, respectively, the international application becomes abandoned as to the United States 20 or 30 months from the 
priority date, respectively. These periods have been placed in the rules as paragraph (h) of § 1 .494 and paragraph (i) of 
§ 1.495. A continuing application under 35 U.S.C. 365(c) and 120 may be filed anytime during the pendency of the 
international application." 
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9. Priority Claim for Prior Application (35 U.S.C. 119) 



The prior U.S. application(s), including any prior International Application designating the U.S. 
identified above in item 8, in turn itself claim(s) foreign priority (ies) as follows: 



(country) (appln. no,) (filed on) 



(country) (appln. no.) (filed on) 



(country) (appln. no.) (filed on) 

The certified copy (ies) 

_ is (are) attached. 

_ has (have) been filed on in prior application serial number 

which was filed on . 



will follow. 



WARNING: The certified copy of the priority application which may have been communicated to the PTO by the 

international Bureau may not be relied on without the need to file a certified copy of the priority application in a 
continuing application . This is so because the certified copy of the priority application communicated by the 
international Bureau is placed in a folder and is not assigned a U.S. serial number unless the national stage is 
entered. Such folders are disposed of if the national stage is not entered. Therefore, such certified copies may 
not be available if needed later in the prosecution of a continuing application. An alternative would be to 
physically remove the priority documents from the folders and transfer them to the continuing application. The 
resources required to request transfer, retrieve the folders, make suitable record notations, transfer the certified 
copies, enter and make a record of such copies in the continuing application are substantial. Accordingly, the 
priority documents in folders of international applications which have not entered the national stage may not be 
relied on. Notice of April 28, 1987 (1079 O.G. 32 to 46). 



10. Further Inventorship Statement Where Benefit of Prior Appiication(s) Claimed 

NOTE: "If the continuation, continuation-in-part, or divisional application is filed by less than all the inventors named in the prior 
application, a statement must accompany the application when filed requesting deletion of the names of the person or 
persons who are not inventors of the invention being claimed in the continuation, continuation-in-part, or divisional 
application." 37 CFR 1.62(a) [emphasis added] (dealing with the file wrapper continuation situation). 

NOTE; "In the case of a continuation-in-part application which adds and claims additional disclosure by amendment, an oath or 
declaration as required by § 1.63 must be filed. In those situations where a new oath or declaration is required due to 
additional subject matter being claimed, additional inventors may be named in the continuing application. In a 
continuation or divisional application which discloses and claims only subject matter disclosed in a prior application, no 
additional oath or declaration is required and the application must name as inventors the same or less than all the 
inventors in the prior application." 37 CFR 1 .60(c). (dealing with the continuation situation). 
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(complete applicable item (a) or (b) below) 

(a) This application discloses and claims only subject matter disclosed in the prior application 

whose particulars are set out above and the inventor(s) in this application are 

the same 

less than those named in the prior application and it is requested that the 

following inventor(s) identified above for the prior application be deleted: 

Name: 

Name: 

Name: 

This application discloses and claims additional disclosure and a new declaration or oath 
is being filed. With respect to the prior application whose particulars are set out above, 
the inventors in this application are 

the same 

add the following inventors 

Name: 

Name: 

Name: 

1 1 . Maintenance of Copendency of Prior Application 

NOTE: The PTO finds it useful if a copy of the petition filed in the prior application extending the term for response is filed with 
the papers constituting the filing of the continuation application. Notice of November 5, 1985 (1060 O.G. 27). 

Extension of time in prior application 

(This item must be completed and the necessary papers filed in the prior application if the period 
set in the prior application has run) 

A petition, fee and response has been filed to extend the term in the prior application until 

A copy of the petition for extension of time in the prior application is attached. 

(complete this item and file conditional petition in prior application if previous item not applicable) 

Conditional Petition For Extension Of Time In Prior Application 

A conditional petition for extension of time is being filed in the pending prior application. 
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12. 



Abandonment of Prior Application (if applicable) 



Please abandon the prior application at a time while the prior application is pending or when the 

petition for extension of time or to revive in that application is granted and when this application is 
granted a filing date so as to make this application copending with said prior application. 

NOTE: According to the Notice of May 13, 1983, (103 t TMOG 6-7), the filing of a continuation or continuation-in-part application 
is a proper response with respect to a petition for extension of time or a petition to revive and should include the express 
abandonment of the prior application conditioned upon the granting of the petition and the granting of a filing date to the 
continuing application. 

NOTE: "A registered attorney or agent acting under the provisions of § 1 .34(a), or of record, may also expressly abandon a prior 
application as of the filing date granted to a continuing application when filing such a continuing application." 37 CFR 
1.138. 



13. Petition For Suspension Of Prosecution For The Time Necessary To File An Amendment 
(if applicable) 



WARNING: "The claims of a new application may be finally rejected in the first Office Action in those situations where (1) 

the new application is a continuing application of, or a substitute for, an earlier application, and (2) all the 
claims of the new application (a) are drawn to the same invention claimed in the earlier application, and (b) 
would have been properly rejected on the grounds of art of record in the next Office Action if they had been 
entered in the earlier application." MPEP § 706.07(b). 

NOTE: Where it is possible that the claims on file will give rise to a first action final for this continuation application and 

for some reason an amendment cannot be filed promptly (e.g., experimental data is being gathered) it may be 
desirable to file a petition for suspension of prosecution for the time necessary. 

(check the next item, if applicable) 

There is provided herewith a Petition to Suspend Prosecution For The Time Necessary To File An 

Amendment (New Application Filed Concurrently) 



14. Notification in Parent Application of this Filing (if applicable) 

A notification of the filing of this application is being filed in the parent application from which this 

application claims priority under 35 U.S.C. 120. 
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15. Fee Calculation (37 CFR 1.16) 



A. X Regular Application 



CLAIMS AS FILED 


Number filed 


Number Extra 




Rate 


Basic Fee 
$ 760.00 


Total 

Claims 37 CFR 1.16(c) 


25 -20 


5 


X 


$18.00 = 


90.00 


Independent 

Claims (37 CFR 1.16(b)) 


9 -3 


6 


X 


$78.00 = 


468.00 


Multiple dependent claim(s), 
if any (37 CFR 1.16(d)) 






+ 


$260.00 = 





Amendment canceling extra claims enclosed. 
Amendment deleting multiple-dependencies enclosed. 
Fee for extra claims is not being paid at this time. 



Filing Fee Calculation $ 1,318.00 



B. _ Design application 

($310.00-37 CFR 1.16(f)) 



Filing Fee Calculation 



C. _ Plant application 

($480.00-37 CFR 1.16(g)) 



Filing Fee Calculation 



1 6. Small Entity Statement(s) 

X Verified Statements(s) that this is a filing by a small entity under 37 CFR 1 .9 and 1 .27 

is(are) attached. 

X will follow. 



Status as a small entity was claimed in prior application serial number 

filed on , from which benefit is being claimed for this 

application under 35 U.S.C. 119(e), 120, 121 or 365(c) and which status as a small entity 
is still proper and desired. A copy of the verified statement in the prior application is 
included. 

Filing Fee Calculation (50% of A, B or C above) $ 659.00 
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17. Request for International-Type Search (37 CFR 1.104(d)) 

_ Please prepare an international-type search report for this application at the time when 
national examination on the merits takes place. 

18. Fee Payment Being Made At This Time 

X Not Enclosed 

X No filing fee is to be paid at this time. (This and the surcharge required by 37 CFR 
1.16(e) can/will be paid subsequently.) 
Enclosed 

basic filing fee $ 

_ recording assignment ($40.00; 37 CFR 1 .21 (h)) $ 

_ petition fee for filing by other than all the 

inventors or person on behalf of the inventor 
where inventor refused to sign or cannot be 

reached. ($130.00; 37 CFR 1.47 and 1.17(h)) $ 



for processing an application with a 
specification in a non-English language. 

($130.00; 37 CFR 1.52(d) an 1.1 7(k)) $ 
processing and retention fee 

($130.00; 37 CFR 1 .53(d) and 1 .21(1)) $ 
fee for international-type search report. 

($40.00; 37 CFR 1.21(e)) $ 



Total Fees Enclosed 



1 9. Method of Payment of Fees 

Check in the amount of $ 

_ Charge Account No. in the amount of $ 

A duplicate of this transmittal is attached. 



20. Authorization to Charge Additional Fees 

_ The Commissioner is hereby authorized to charge the following additional fees by this 
paper and during the entire pendency of this application to Account No. ; 

_ 37 CFR 1 .16(a), (f) or (g) (filing fees) 

_ 37 CFR 1 .1 6(b), (c) and (d) (presentation of extra claims) 
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37 CFR 1.16(e) (surcharge for filing the basic filing fee and/or declaration on a 
date later than the filing date of the application) 
37 CFR 1.18 (application processing fees) 

37 CFR 1.18 (issue fee at or before mailing of Notice of Allowance, pursuant to 
37 CFR 1.311(b)) 



21 . Instructions As To Overpayment 

_ credit Account No. 
X refund 



22. Incorporation By Reference of Papers Identified Herein 

Applicant(s) hereby incorporate(s) by reference all papers which are identified in this New 
Application Transmittal. 



Dated: 




O'BANION & RITCHEY LLP 
400 Capitol Mall, Suite 1550 
Sacramento, CA 95814 
Telephone: (916) 498-1010 
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BACKGROUND OF THE INVENTION 
1. Field of the Invention 

This invention relates generally to digital communications and, in particular, to 
parametric speech coding and decoding methods and apparatus. 
5 2. Description of the Background Art 

For the purpose of definition, it should be noted that the term "vocoder" is frequently 
used to describe voice coding methods wherein voice parameters are transmitted instead of 
digitized waveform samples. In the production of digitized waveform samples, an incoming 
waveform is periodically sampled and digitized into a stream of digitized waveform data which 
10 can be converted back to an analog waveform virtually identical to the original waveform. The 
encoding of a voice using voice parameters provides sufficient accuracy to allow subsequent 
synthesis of a voice which is substantially similar to the one encoded. Note that the use of voice 
parameter encoding does not provide sufficient information to exactly reproduce the voice 
waveform, as is the case with digitized waveforms; however the voice can be encoded at a lower 
1 5 data rate than is required with waveform samples. 

In the speech coding community, the term "coder" is often used to refer to a speech 
encoding and decoding system, although it also often refers to an encoder by itself. As used 
herein, the term encoder generally refers to the encoding operation of mapping a speech signal to 
a compressed data signal (the bitstream), and the term decoder generally refers to the decoding 
20 operation where the data signal is mapped into a reconstructed or synthesized speech signal. 

Digital compression of speech (also called voice compression) is increasingly important 
for modern communication systems. The need for low bit rates in the range of 500 bps (bits per 
second) to 2 kbps (kilobits per second) for transmission of voice is desirable for efficient and 
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secure voice communication over high frequency (HF) and other radio channels, for satellite 
voice paging systems, for multi-player Internet games, and numerous additional applications. 
Most compression methods (also called "coding methods") for 2.4 kbps, or below, are based on 
parametric vocoders. The majority of contemporary vocoders of interest are based on variations 

5 of the classical linear predictive coding (LPC) vocoder and enhancements of that technique, or 
are based on sinusoidal coding methods such as harmonic coders and multiband excitation 
coders [1], Recently an enhanced version of the LPC vocoder has been developed which is 
called MELP (Mixed Excitation Linear Prediction) [2, 5, 6]. The present invention can provide 
similar voice quality levels at a lower bit rate than is required in the conventional encoding 

1 0 methods described above. 

This invention is generally described in relation to its use with MELP, since MELP 
coding has advantages over other frame-based coding methods. However the invention is 
applicable to a variety of coders, such as harmonic coders [15], or multiband excitation (MBE) 
type coders [14]. 

1 5 The MELP encoder observes the input speech and, for each 22.5 ms frame, it generates 

data for transmission to a decoder. This data consists of bits representing line spectral 
frequencies (LSFs) (which is a form of linear prediction parameter), Fourier magnitudes 
(sometimes called "spectral magnitudes), gains (2 per frame), pitch and voicing, and additionally 
contains an aperiodic flag bit, error protection bits, and a synchronization (sync) bit. FIG. 1 

20 shows the buffer structure used in a conventional 2.4 kbps MELP encoder. The encoder 

employed with other harmonic or MBE coding methods generates data representing many of the 
same or similar parameters (typically these are LSFs, spectral magnitudes, gain, pitch, and 
voicing). The MELP decoder receives these parameters for each frame and synthesizes a 
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corresponding frame of speech that approximates the original frame. 

Different communication systems require speech coders with different bit-rates. For 
example, a high frequency (HF) radio channel may have severely limited capacity and require 
extensive error correction and a bit rate of 1.2 kbps may be most suitable for representing the 
5 speech parameters, whereas a secure voice telephone communication system often requires a bit 
rate of 2 A kbps. In some applications it is necessary to interconnect different communication 
systems so that a voice signal originally encoded for one system at one bit rate is subsequently 
converted into an encoded voice signal at the other bit rate for another system. This conversion 
is referred to as "transcoding", and it can be performed by a "transcoder" typically located at a 
% 1 0 gateway between two communication systems. 

O BRIEF SUMMARY OF THE INVENTION 

ffl In general terms, the present invention takes an existing vocoder technique, such as 

4? MELP and substantially reduces the bit rate, typically by a factor of two, while maintaining 
fS1 1 5 approximately the same reproduced voice quality. The existing vocoder techniques are made use 
*n of within the invention, and they are therefore referred to as "baseline" coding or alternately 
"conventional" parametric voice encoding. 

By way of example, and not of limitation, the present invention comprises a 1 .2 kbps 
vocoder that has analysis modules similar to a 2.4 kbps MELP coder to which an additional 
20 superframe vocoder is overlayed. A block or "superframe" structure comprising three 

consecutive frames is adopted within the superframe vocoder to more efficiently quantize the 
parameters that are to be transmitted for the 1.2 kbps vocoder of the present invention. To 
simplify the description, the superframe is chosen to encode three frames, as this ratio has been 
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found to perform well. It should be noted, however, that the inventive methods can be applied to 
superframes comprising any discrete number of frames. A superframe structure has been 
mentioned in previous patents and publications [9], [10], [1 1], [13]. Within the MELP coding 
standard, each time a frame is analyzed (e.g., every 22.5 ms), its parameters are encoded and 

5 transmitted. However, in the present invention each frame of a superframe is concurrently 
available in a buffer, each frame is analyzed, and the parameters of all three frames within the 
superframe are simultaneously available for quantization. Although this introduces additional 
encoding delay, the temporal correlation that exists among the parameters of the three frames can 
be efficiently exploited by quantizing them together rather than separately. 

1 0 The frame size of the 1 .2 kbps coder of the present invention is preferably 22.5 ms (or 

1 80 samples of speech) at a sampling rate of 8000 samples per second, which is the same as in 
the MELP standard coder. However, in order to avoid large pitch errors, the length of the look- 
ahead is increased in the invention by 129 samples. In this regard, note that the term "look- 
ahead" refers to the time duration of the "future" speech segment beyond the current frame 

1 5 boundary that must be available in the buffer for processing needed to encode the current frame. 
A pitch smoother is also used in the 1.2 kbps coder of the present invention, and the algorithmic 
delay for the 1.2 kbps coder is 103.75 ms. The transmitted parameters for the 1.2 kbps coder are 
the same as for the 2.4 kbps MELP coder. 

Within the MELP coding standard, the low band voicing decision or Unvoiced/Voiced 

20 decision (U/V decision) is found for each frame. The frame is said to be "voiced" when the low 
band voicing value is "1", and "unvoiced" when it is "0". This voicing condition determines 
which of two different bit allocations is used for the frame. However, in the 1.2 kbps coder of 
the present invention, each superframe is categorized into one of several coding states with a 
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different bit allocation for each state. State selection is done according to the U/V (unvoiced or 
voiced) pattern of the superframe. If a channel bit error leads to an incorrect state identification 
by the decoder, serious degradation of the synthesized speech for that superframe will result. 
Therefore an aspect of the present invention comprises techniques to reduce the effect of state 

5 mismatch between encoder and decoder due to channel errors, which techniques have been 
developed and integrated into the decoder. 

In the present invention, three frames of speech are simultaneously available in a memory 
buffer and each frame is separately analyzed by conventional MELP analysis modules, 
generating (unquantized) parameter values for each of the three frames. These parameters are 

1 0 collectively available for subsequent processing and quantization. The pitch smoother observes 
pitch and U/V decisions for the three frames and also performs additional analysis on the 
buffered speech data to extract parameters needed to classify each frame as one of two types 
(onset or offset) for use in a pitch smoothing operation. The smoother then outputs modified 
(smoothed) versions of the pitch decisions, and these pitch values for the superframe are then 

1 5 quantized. The bandpass voicing smoother observes the bandpass voicing strengths for the three 
frames, as well as examines energy values extracted directly from the buffered speech, and then 
determines a cutoff frequency for each of the three frames. The bandpass voicing strengths are 
parameters generated by the MELP encoder to describe the degree of voicing in each of five 
frequency bands of the speech spectrum. The cutoff frequencies, defined later, describe the time 

20 evolution of the bandwidth of the voiced part of the speech spectrum. The cutoff frequency for 
each voiced frame in the superframe is encoded with 2 bits. The LSF parameters, Jitter 
parameter, and Fourier magnitude parameters for the superframe are each quantized. Binary data 
is obtained from the quantizers for transmission. Not described for the sake of simplicity are the 
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error correction bits, synchronization bit, parity bit, and the multiplexing of the bits into a serial 
data stream for transmission, all of which are well-known to those skilled in the art. At the 
receiver, the data bits for the various parameters are extracted, decoded and applied to inverse 
quantizers that recreate the quantized parameter values from the compressed data. A receiver 
typically includes a synchronization module which identifies the starting point of a superframe, 
and a means for error correction decoding and demultiplexing. The recovered parameters for 
each frame can be applied to a synthesizer. After decoding, the synthesized speech frames are 
concatenated to form the speech output signal. The synthesizer may be a conventional frame- 
based synthesizer, such as MELP, or it may be provided by an alternative method as disclosed 
herein. 

An object of the invention is to introduce greater coding efficiencies and exploit the 
correlation from one frame of speech to another by grouping frames into superframes and 
performing novel quantization techniques on the superframe parameters. 

Another object of the invention is to allow the existing speech processing functions of the 
baseline encoder and decoder to be retained so that the enhanced coder operates on the 
parameters found in the baseline coder operation, thereby preserving the wealth of 
experimentation and design results already obtained with baseline encoders and decoders while 
still offering greatly reduced bit rates. 

Another object of the invention is to provide a mechanism for transcoding, wherein a bit 
stream obtained from the enhanced encoder is converted (transcoded) into a bit stream that will 
be recognized by the baseline decoder, while similarly providing a way to convert the bit stream 
coming from a baseline encoder into a bit stream that can be recognized by an enhanced decoder. 
This transcoding feature is important in applications where terminal equipment implementing a 
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baseline coder/decoder must communicate with terminal equipment implementing the enhanced 
coder/decoder. 

Another object of the invention is to provide methods for improving the performance of 
the MELP encoder by wherein new methods generate pitch and voicing parameters. 

Another object of the invention is to provide a new decoding procedure that replaces the 
MELP decoding procedure and substantially reduces complexity while maintaining the 
synthesized voice quality. 

Another object of the invention is to provide a 1.2 kbps coding scheme that gives 
approximately equal quality to the MELP standard coder operating at 2.4 kbps. 

Further objects and advantages of the invention will be brought out in the following 
portions of the specification, wherein the detailed description is for the purpose of fully 
disclosing preferred embodiments of the invention without placing limitations thereon. 

BRIEF DESCRIPTION OF THE DRAWINGS 
The invention will be more fully understood by reference to the following drawings 

which are for illustrative purposes only: 

FIG. 1 is a diagram of data positions used within the input speech buffer structure of a 

conventional 2.4 kbps MELP coder. The units shown indicate samples of speech. 

FIG. 2 is a diagram of data positions used within the input superframe speech buffer 

structure of the 1 .2 kbps coder of the present invention. The units shown indicate samples of 

speech. 

FIG. 3 A is a functional block diagram of the 1.2 kbps encoder of the present invention. 
FIG. 3B is a functional block diagram of the 1 .2 kbps decoder of the present invention. 
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FIG. 4 is a diagram of data positions within the 1.2 kbps encoder of the present invention 
showing computation positions for computing pitch smoother parameters within the present 
invention, where the units shown indicate samples of speech. 

FIG. 5 A is a functional block diagram of a 1200 bps stream up-converted by a transcoder 
5 into a 2400 bps stream. 

FIG. 5B is a functional block diagram of a 2400 bps stream down-converted by an 
transcoder into a 1200 bps stream. 

FIG. 6 is a functional block diagram of hardware within a digital vocoder terminal which 
employs the inventive principles in accord with the present invention. 

S DETAILED DESCRIPTION OF THE INVENTION 

p For illustrative purposes the present invention will be described with reference to FIG. 2 

ffl through FIG. 6. It will be appreciated that the apparatus may vary as to configuration and as to 
O details of the parts, and that the method may vary as to the specific steps and sequence, without 

1 5 departing from the basic concepts as disclosed herein, 
/g 1. OVERVIEW OF THE VOCODER 

The 1.2 kbps encoder of the present invention employs analysis modules similar to those 
used in a conventional 2.4 kbps MELP coder, but adds a block or "superframe" encoder which 
encodes three consecutive frames and quantizes the transmitted parameters more efficiently to 
20 provide the 1.2 kbps vocoding. Those skilled in the art will appreciate that although the 

invention is described with reference to using three frames per superframe, the method of the 
invention can be applied to superframes comprising other integral numbers of frames as well. 
Furthermore, those skilled in the art will also appreciate that although the invention is described 
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with respect to the use of MELP as the baseline coder, the methods of the invention can be 
applied to other harmonic vocoders. Such vocoders may have a similar, but not identical, set of 
parameters extracted from analysis of a speech frame and the frame size and bit rates may be 
different from those used in the description presented here. 

5 It will be appreciated that when a frame is analyzed within a MELP encoder, (e.g. every 

22.5 ms), voice parameters are encoded for each frame and then transmitted. Yet, in the present 
invention, data from a group of frames, forming a superframe, is collected and processed with 
the parameters of all three frames in the superframe which are simultaneously available for 
quantization. Although this introduces additional encoding delay, the temporal correlation that 

1 0 exists among the parameters of the three frames can be efficiently exploited by quantizing them 
together rather than separately. 

The frame size employed in the present invention is preferably 22.5 ms (or 180 samples 
of speech) at a sampling rate of 8000 samples per second, which is the same sample rate used in 
the original MELP coder. The buffer structure of a conventional 2.4 kbps MELP is shown in 

1 5 FIG. 1 . The length of look-ahead buffer has been increased in the preferred embodiment by 129 
samples, so as to reduce the occurrence of large pitch errors, although the invention can be 
practiced with various levels of look-ahead. Additionally, a pitch smoother has been introduced 
to further reduce pitch errors. The algorithmic delay for the 1 .2 kbps coder described is 103.75 
ms. The transmitted parameters for the 1 .2 kbps coder are the same as for the 2.4 kbps MELP 

20 coder. The buffer structure of the present invention can be seen in FIG. 2. 
1.1 Bit Allocation 

When using MELP coding, the low band voicing decision, or U/V decision, is found for 
each "voiced" frame when the low band voicing value is 1 and unvoiced when it is 0. However 
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in the 1.2 kbps coder of the present invention each superframe is categorized into one of several 
coding states employing different quantization schemes. State selection is performed according 
to the U/V pattern of the superframe. If a channel bit error leads to an incorrect state 
identification by the decoder, serious degradation of the synthesized speech for that superframe 
5 will result. Therefore, techniques to reduce the effect of state mismatch between encoder and 
decoder due to channel errors have been developed and integrated into the decoder. For 
comparison purposes, the bit allocation schemes for both the 2.4 kbps (MELP) coder and the 1 .2 
kbps coder are shown in Table 1. 

FIG. 3 A is a general block diagram of the 1.2 kbps coding scheme 10 in accord with the 
^ 10 present invention. Input speech 12 fills a memory buffer called a superframe buffer 14 which 
S comprises a superframe and in addition stores the history samples that preceded the start of the 
□ oldest of the three frames and the look-ahead samples that follow the most recent of the three 
Qj frames. The actual range of samples stored in this buffer for the preferred embodiment are as 
O shown in FIG 2. Frames within the superframe buffer 14 are separately analyzed by 
y*: 15 conventional MELP analysis modules 16, 18, 20 which generate a set of unquantized parameter 
/X values 22 for each of the frames within the superframe buffer 14. Specifically, a MELP analysis 
module 16 operates on the first (oldest) frame stored in the superframe buffer, another MELP 
analysis module 18 operates on the second frame stored in the buffer, and another MELP 
analysis module 20 operates on the third (most recent) frame stored in the buffer. Each MELP 
20 analysis block has access to a frame plus prior and future samples associated with that frame. 
The parameters generated by the MELP analysis modules are collected to form the set of 
unquantized parameters stored in memory unit 22, which is available for subsequent processing 
and quantization. The pitch smoother 24 observes pitch values for the frames within the 
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superframe buffer 14, in conjunction with a set of parameters computed by the smoothing 
analysis block 26 and outputs modified versions of the pitch values when the output is quantized 
28. A bandpass voicing smoother 30 observes an average energy value computed by the energy 
analysis module 32 and it also observes the bandpass voicing strengths for the frames within the 
5 superframe buffer 14 and suitably modifies them for subsequent quantization by the bandpass 
voicing quantizer 32. An LSP quantizer 34, Jitter quantizer 36, and Fourier magnitudes 
quantizer 38 each output encoded data. Encoded binary data is obtained from the quantizers for 
transmission. Not shown for simplicity are the generation of error correction data bits, a 
synchronization bit, and multiplexing of the bits into a serial data stream for transmission which 
y 1 0 those skilled in the art will readily understand how to implement. 
2E At the decoder 50, shown in FIG. 3B, the data bits for the various parameters are 

□ contained in the channel data 52 which enters a decoding and inverse quantizer 54, which 
ffl extracts, decodes and applies inverse quantizers to recreate the quantized parameter values from 
Q the compressed data. Not shown are the synchronization module (which identifies the starting 
LH 1 5 P°i nt °f a superframe) and the error correction decoding and demultiplexing which those skilled 
'J? in the art will readily understand how to implement. The recovered parameters for each frame 
are then applied to conventional MELP synthesizers 56, 58, 60. It should be noted that this 
invention includes an alternative method of synthesizing speech for each frame that is entirely 
different from the prior art MELP synthesizer. After being decoded, the synthesized speech 
20 frames 62, 64, 66 are concatenated to form the speech output signal 68. 
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2. SPEECH ANALYSIS 
2.1 Overview 

The basic structure of the encoder is based on the same analysis module used in the 2.4 
kbps MELP coder except that a new pitch smoother and bandpass-voicing smoother are added to 
5 take advantage of the superframe structure. The coder extracts the feature parameters from three 
successive frames in a superframe using the same MELP analysis algorithm, operating on each 
frame, as used in the 2.4 kbps MELP coder. The pitch and bandpass voicing parameters are 
enhanced by smoothing. This enhancement is possible because of the simultaneous availability 
of three adjacent frames and the look-ahead. By operating in this manner on the superframe, the 
1 0 parameters for all three frames are available as input data to the quantization modules, thereby 
allowing more efficient quantization than is possible when each frame is separately and 
p independently quantized. 
| 2.2 Pitch Smoother 

C: The pitch smoother takes the pitch estimates from the MELP analysis module for each 

y 1 5 frame in the superframe and a set of parameters from the smoothing analysis module 26 shown 
/}; in FIG. 3 A. The smoothing analysis module 26 computes a set of new parameters every half 

frame (1 1.25 ms) from direct observation of the speech samples stored in the superframe buffer. 
The nine computation positions in the current superframe are illustrated in FIG. 4. Each 
computation position is at the center of a window in which the parameters are computed. The 
20 computed parameters are then applied as additional information to the pitch smoother. 

In the 1.2 kbps encoder, each frame is classified into two categories, comprising either 
onset or offset frames in order to guide the pitch smoothing process. The new waveform feature 
parameters computed by the smoothing analysis module 26, and then used by the pitch smoother 
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module 24 for the onset/offset classification, are as follows: 



Description 

energy in dB 
zero crossing rate 
peakiness measurement 

maximum correlation coefficient of input speech 

maximum correlation coefficient of 500Hz low pass filtered speech 

Energy of low pass filtered speech 

Energy of high pass filtered speech 



Abbreviation 

subEnergy 
zeroCrosRate 
peakiness 
corx 

lowBandCorx 

lowBandEn 

highBandEn 



Input speech is denoted as x(n) 9 n =...,0,1,... . where x(0) corresponds to the speech sample that 
is 45 samples to the left of the current computation position, and n is 90 samples, which is half 
of the frame size. The parameters are computed as following 
(1) Energy: 



jV-1 



E* 2 (>0 



.n=0 



(2) 



subEnergy - 101og 10 
Zero crossing rate: 

iV-2 

zeroCrosRate = J][jt(i)**(* + 1) > 0?ftl] 
where the expression in square brackets has value 1 when the product x(i) *x(i+l) is negative 
(i.e., when a zero crossing occurs) and otherwise it has value zero. 
(3) Peakiness measurement in speech domain: 



peakiness = 



I* 2 («)/JV 



n=0 



iV-1 



IN 



n=0 



The peakiness measure is defined as in the MELP coder [5], however, here this measure is 
computed from the speech signal itself, whereas in MELP it is computed from the prediction 
residual signal that is derived from the speech signal. 
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(4) Maximum correlation coefficient in pitch search range: 
First the input speech signal is passed through a low-pass filter with an 800Hz cutoff 
frequency, where: 

H(z) = 03069/(1 -2.4552Z" 1 +2.4552z" 2 -1.152Z" 3 + 0.2099z" 4 ) 
The low-pass filtered signal is passed through a 2 nd order LPC inverse filter. The inverse filtered 

signal is denoted as s lv (n) . The DC component is removed from s lv (n) to obtain s iv (n) . Then, 

the autocorrelation function is computed by: 

M-l 



IX(«K( w +£) 

-1 M-l 
|£? 2 /v(«)-2V/v(w + k) 



r k = . " =0 k = 20,...,150 

K M-l M-l 



V n=0 n=0 

where M= 70. The samples are selected using a sliding window chosen to align the current 
computation position to the center of the autocorrelation window. The maximum correlation 
coefficient parameter corx is the maximum of the function r k . The corresponding pitch is /. 

corx- max r, / = arg max r, 

20<*<150 20 ^ 150 

(5) Maximum correlation coefficient of low pass filtered speech: 

In the standard MELP, five filters are used in bandpass voicing analysis. The first filter is 
actually a low-pass filter with passband of 0-500Hz. The same filter is used on input speech to 
generate the low-pass filtered signal s^n) . Then the correlation function defined in (4) is 

computed on s l (n) . The range of the indices is limited by [max(20, / - 5), min(l 50, / + 5)] . The 
maximum of the correlation function is denoted as lowBandCorx. 
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(6) Low band energy and high band energy: 

In the LPC analysis module, the first 17 autocorrelation coefficients r{n\ n = 0,...,16 are 
computed. The low band energy and high band energy are obtained by filtering the 
autocorrelation coefficients. 



16 



lowBandEn = r (0) • Q (0) + 2^ r (n) • C, (n) 

16 

highBandEn = r(0) • C h (0) + 2^ K") • C h (n) 
The C,(w) and C A (») are the coefficients for low pass filter and the high pass filter. The 16 filter 
coefficients for each filter are chosen for a cutoff frequency of 2 kHz and are obtained with a 
standard FIR filter design technique. 

The parameters enumerated above are used to make rough U/V decisions for each half 
frame. The classification logic for making the voicing decisions shown below is performed in 
the pitch smoother module 24. The voicedEn and silenceEn are the running average energies of 
voiced frames and silence frames. 

structure { 

subEnergy; /* energy in dB */ 

zeroCorsRate; /* zero crossing rate 7 

peakiness; /* peakiness measurement 7 

corx; /* maximum correlation coefficient of input speech 7 

lowBandCorx; /* maximum correlation coefficient of 

500Hz low pass filtered speech 7 

lowBandEn; /* Energy of low pass filtered speech 7 

highBandEn; /* Energy of high pass filtered speech 7 

} classStat[9]; 

if( classStat -> subEnergy < 30 ){ 

classy = SILENCE; 
}else if( classStat -> subEnergy < 0.35*voicedEn + 0.65*silenceEn ){ 
if( (classStat->zeroCrosRate > 0.6) && 

((classStat->corx<0.4) || (classStat->lowBandCorx < 0.5)) ) 
classy = UNVOICED; 
else if( (classStat->lowBandCorx > 0.7) || 

((classStat->lowBandCorx > 0.4) && (classStat->corx > 0.7)) ) 
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classy = VOICED; 
else if( (classStat->zeroCrosRate-classStat[-1].zeroCrosRate>0.3) || 

(classStat->subEnergy - classStat[-1].subEnergy > 20) || 
(classStat->peakiness > 1 .6) ) 
5 classy = TRANSITION; 

else if((classStat->zeroCrosRate > 0.55) || 

((classStat->highBandEn > classStat->lowBandEn-5) && 
(classStat->zeroCrosRate > 0.4)) ) 
classy = UNVOICED; 
10 else classy = SILENCE; 

}else{ 

if( (classStat->zeroCrosRate - classStat[-1].zeroCrosRate > 0.2) || 

(classStat->subEnergy - classStat[-1].subEnergy > 20) || 
(classStat->peakiness > 1 .6) ){ 
15 if( (classStat->lowBandCorx > 0.7) || (classStat->corx > 0.8) ) 

classy = VOICED; 

else 

classy = TRANSITION; 
}else if( classStat -> zeroCrosRate < 0.2 ){ 
Q 20 if( (classStat-XowBandCorx > 0.5) || 

£ ((classStat->lowBandCorx > 0.3) && (classStat->corx > 0.6)) 

=p; classy = VOICED; 

O else if( classStat->subEnergy > 0.7*voicedEn+0.3*silenceEn ){ 

H if( classStat->peakiness > 1.5 ) 

Q 25 classy = TRANSITION; 

Ots else{ 
m classy = VOICED; 

} 

C| }else{ 

§ 30 classy = SILENCE; 

rtj } 

fU }else if( classStat -> zeroCrosRate < 0.5 ){ 

if( (classStat->lowBandCorx > 0.55) || 

((classStat->lowBandCorx > 0.3) && (classStat->corx > 0.65)) ) 
35 classy = VOICED; 

else if( (classStat->subEnergy < 0.4*voicedEn+0.6*silenceEn) && 
(classStat->highBandEn < classStat->lowBandEn-10) ) 
classy = SILENCE; 
else if( classStat->peakiness > 1 .4) 
40 classy = TRANSITION; 

else 

classy = UNVOICED; 
}else if( classStat -> zeroCrosRate < 0.7 ){ 

if( ((classStat->lowBandCorx > 0.6) && (classStat->corx > 0.3)) || 
45 ((classStat->lowBandCorx > 0.4) && (classStat->corx > 0.7)) ) 

classy = VOICED; 
else if( classStat->peakiness > 1.5 ) 
classy = TRANSITION; 

else 
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classy = UNVOICED; 

}else{ 

if( ((c!assStat->lowBandCorx > 0.65) && (classStat->corx > 0.3)) || 
((classStat->lowBandCorx > 0.45) && (classStat->corx > 0.7)) ) 

classy = VOICED; 
else if( classStat->peakiness > 2.0 ) 

classy = TRANSITION; 

else 

classy = UNVOICED; 

} 

} 

The U/V decisions for each subframe are then used to classify the frames as onset or 
offset. This classification is internal to the encoder and is not transmitted. For each current 
frame, first the possibility of an offset is checked. An offset frame is selected if the current 
voiced frame is followed by a sequence of unvoiced frames, or the energy declines at least 8 dB 
within one frame or 12 dB within one and one-half frames. The pitch of an offset frame is not 
smoothed. 

If the current frame is the first voiced frame, or the energy increases by at least 8 dB 
within one frame or 12 dB within one and one-half frames, the current frame is classified as an 
onset frame. For the onset frames, a look-ahead pitch candidate is estimated from one of the 
local maximums of the autocorrelation function evaluated in the look-ahead region. First, the 8 
largest local maximums of the autocorrelation function given above are selected. The 
maximums are denoted for the current computation position as R {0) (i), i = 0, . . . , 7 . The 
maximums for the next two computation positions are R {l \i) 9 R {2} (i) . A cost function for each 
computation position is computed, and the cost function for the current computation position is 
used to estimate the predicted pitch. The cost function for R (2) (i) is computed first as: 



C m (?) = W[l-R m (i)] 

where Wis a constant which is 100. For each maximum R (l) (i) , the corresponding pitch is 
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denoted as p w (i). The cost function C (1) (0 is computed as: 

C^O) = W[l- R m (i)]+ \p m (i) - pV\k,)\ + e 2 \k t ) 
The index k t is chosen as: 

k t =argmax(i? (2) (/)) \p {2 \l)-p m (i)\/p m (i)< 0.2 
If the range for / is an empty set in the above equation, then we use range / e [0,7] . The cost 
function C (0) (/') is computed in a similar way as the C* l) (i) . The predicted pitch is chosen as 

/> = argmax(C (0) (0) i = 0,...,7 
The look-ahead pitch candidate is selected as current pitch, if the difference between the original 
pitch estimate and the look-ahead pitch is larger than 15%. 

If the current frame is neither offset nor onset, the pitch variation is checked. If a pitch 
jump is detected, which means the pitch decreases and then increases or increases and then 
decreases, the pitch of the current frame is smoothed using interpolation between the pitch of the 
previous frame and the pitch of the next frame. For the last frame in the superframe the pitch of 
the next frame is not available, therefore a predicted pitch value is used instead of the next frame 
pitch value. The above pitch smoother detect many of the large pitch errors that would otherwise 
occur and in formal subjective quality tests, the pitch smoother provided significant quality 
improvement. 

2.3 Bandpass Voicing Smoother 

In MELP encoding, the input speech is filtered into five subbands. Bandpass voicing 
strengths are computed for each of these subbands with each voicing strength normalized to a 
value of between 0 and 1 . These strengths are subsequently quantized to 0s or Is, to obtain 
bandpass voicing decisions. The quantized lowband (0 to 500 Hz) voicing strength determines 
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the unvoiced, or voiced, (U/V) character of the frame. The binary voicing information of the 
remaining four bands partially describes the harmonic or nonharmonic character of the spectrum 
of a frame and can be represented by a four bit codeword. Li this invention, a bandpass voicing 
smoother is used to more compactly describe this information for each frame in a superframe and 
to smooth the time evolution of this information across frames. First the four bit codeword is 
mapped (1 for voiced, 0 for unvoiced) for the remaining four bands for each frame into a single 
cutoff frequency with one of four allowed values. This cutoff frequency approximately identifies 
the boundary between the lower region of the spectrum that has a voiced (or harmonic) character 
and the higher region that has an unvoiced character. The smoother then modifies the three 
cutoff frequencies in the superframe to produce a more natural time evolution for the spectral 
character of the frames. The 4-bit binary voicing codeword for each of the frame decisions is 
mapped into four codewords using the 2-bit codebook shown in Table 2. The entries of the 
codebook are equivalent to the four cutoff frequencies: 500 Hz, 1000 Hz, 2000 Hz and 4000 Hz 
which correspond respectively to the columns labeled: 0000, 1000, 1 100, and 1 1 1 1 in the 
mapping table given in Table 2. For example, when the bandpass voicing pattern for a voiced 
frame is 1001, this index is mapped into 1000, which corresponds to a cutoff frequency of 1000 
Hz. 

For the first two frames of the current superframe, the cutoff frequency is smoothed 
according to the bandpass voicing information of the previous frame and the next frame. The 
cutoff frequency in the third frame is left unchanged. The average energy of voiced frames is 
denoted as VE. The value of VE is updated at each voiced frame for which the two prior frames 
are voiced. The updating rule is: 

VE new = 101og 10 [0.9e K ^ /10 +QXe subEner ® m ] 
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For the frame i 9 the energy of the current frame is denoted as en, . The voicing strengths 
for the five bands are denoted as bp[k] i9 k = 1,...,5 . The following three conditions are 
considered to smooth the cutoff frequency f, . 

(1) If the cutoff frequencies of the previous frame and the next frame are both above 
2000 Hz, then execute the following procedure. 

If ( /, < 2000 and ( ( en, > VE - 5 dB) or (ip[2] M > 0.5 and&p[3] M > 0.5 ) ) ) 

/, = 2000 Hz 
else if ( /, <1000) 

/, = 1000 Hz 

(2) If the cutoff frequencies of the previous frame and the next frame are both above 
1000 Hz, then execute the following procedure. 

If ( f, < 1000 and ( {en, > VE - 10 dB) or (bp[2\_ x > 0.4) ) ) 
/, = 1000Hz 

(3) If the cutoff frequencies of the previous frame and the next frame are all below 
1000Hz, then execute the following procedure. 

If ( y; >2000 and en, < VE-5 dB and ^[3]^ <0.7 ) 
/, = 2000 Hz 
3. QUANTIZATION 
3.1 Overview 

The transmitted parameters of the 1.2 kbps coder are the same as those of the 2.4 kbps 
MELP coder except that in the 1 .2 kbps coder the parameters are not transmitted frame by frame 
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but are sent once for each superframe. The bit-allocation is shown in Table 1. New quantization 
schemes were designed to take advantage of the long block size (the superframe) by using 
interpolation and vector quantization (VQ). The statistical properties of voiced and unvoiced 
speech are also taken into account. The same Fourier magnitude codebook of the 2.4 MELP 
5 kbps coder is used in the 1 .2 kbps coder in order to save memory and to make the transcoding 
easier. 

3.2 Pitch Quantization 

The pitch parameters are applicable only for voiced frames. Different pitch quantization 
schemes are used for different U/V combinations across the three frames. The detailed method 
1 0 for quantizing the pitch values of a superframe is herein described for a particular voicing 
p: pattern. The quantization method described in this section is used in the joint quantization of the 
□ voicing pattern, while the pitch will be described in the following section. The pitch 
HP quantization schemes are summarized in Table 3. Within those superframes where the voicing 
O pattern contains either two or three voiced frames, the pitch parameters are vector-quantized. 
|Jf 15 For voicing patterns containing only one voiced frame, the scalar quantizer specified in the 
^ MELP standard is applied for the pitch of the voiced frame. For the UUU voicing pattern, where 
each frame is unvoiced, no bits are needed for pitch information. Note that U denotes 
"Unvoiced" and V denotes "Voiced". 

Each pitch value, P, obtained from the pitch analysis of the 2.4 kbps standard is 
20 transformed into a logarithmic value, p = log P, before quantization. For each superframe, a 
pitch vector is constructed with components equal to the log pitch value for each voiced frame 
and a zero value for each unvoiced frame. For voicing patterns with two or three voiced frames, 
the pitch vector is quantized using a VQ (Vector Quantization) algorithm with a new distortion 
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measure that takes into account the evolution of the pitch. This algorithm incorporates pitch 
differentials in the codebook search, which makes it possible to consider the time evolution of 
the pitch. A standard VQ codebook design is used [7]. The VQ encoding algorithm incorporates 
pitch differentials in the codebook search, which makes it possible to consider the time evolution 
of the pitch in selecting the VQ codebook entry. This feature is motivated by the perceptual 
importance of adequately tracking the pitch trajectory. The algorithm has three steps for 
obtaining the best index: 

Step 1 : Select the M-best candidates using the weighted squared Euclidean distance 
measure: 

3 

rf = E w JA-Af (i) 

11, if the corresponding frame is voiced 
0, if the corresponding frame is unvoiced. 

and p. is the unquantized log pitch, p. is the quantized log pitch value. The above equation 
indicates that only voiced frames are taken into consideration in the codebook search. 
Step 2 : Calculate differentials of the unquantized log pitch values using: 

f Pi ~ Pi~\ if i - th and (i - 1) - th frames are voiced 
Ap;=\ (2) 
Fl [0 else v ' 

for i = 1, 2, 3, where p Q is the last log pitch value of the previous superframe. For the candidate 
log pitch values selected in step 1, calculate differentials of the candidates by replacing Ap. and 
p t by Ap t and p. respectively in equation (2), where p 0 is the quantized version of p Q . 
Step 3 : Select the index from the M best candidates that minimizes: 

<*' = Zwi|ft-Af+*£ |4P,-4P,f=*+*Z |AP,-4P/f (3) 

;=i /=i (=i 
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where 8 is a parameter to control the contribution of pitch differentials which is set to be 1 . 

For superframes that contain only one voiced frame, scalar quantization of the pitch is 
performed. The pitch value is quantized on a logarithmic scale with a 99-level uniform quantizer 
ranging from 20 to 160 samples. The quantizer is the same as that in the 2.4 kbps MELP 
5 standard, where the 99 levels are mapped to a 7 bit pitch codeword and the 28 unused codewords 
with Hamming weight 1 or 2 are used for error protection. 

3.3 Joint Quantization of Pitch and U/V Decisions 

The U/V decisions and pitch parameters for each superframe are jointly quantized using 
12 bits. The joint quantization scheme is summarized in Table 4. In other words, the voicing 
J 1 0 pattern or mode (one of 8 possible patterns) and the set of three pitch values for the superframe 
I form the input to a joint quantization scheme whose output is a 12 bit word. The decoder 
% subsequently maps this 12 bit word by means of a table lookup into a particular voicing pattern 
I and a quantized set of 3 pitch values. 

I In this scheme, the allocation of 12-bits consists of 3 mode bits (representing the 8 

\ 1 5 possible combinations of U/V decisions for the 3 frames in a superframe) and the remaining 9 
bits for pitch values. The scheme employs six separate pitch codebooks, five having 9 bits (i.e. 
512 entries each) and one being the scalar quantizer as indicated in Table 4; the specific 
codebook is determined according to the bit patterns of the 3-bit codeword representing the 
quantized voicing pattern. Therefore the U/V voicing pattern is first encoded into a 3-bit 
20 codeword as shown in Table 4, which is then used to select one of the 6 codebooks shown. The 
ordered set of 3 pitch values is then vector quantized with the selected codebook to generate a 9- 
bit codeword that identifies the quantized set of 3 pitch values. Note that four codebooks are 
assigned to the superframes in the VW (voiced-voiced-voiced) mode, which means that the 
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pitch vectors in the VW type superframes are each quantized by one of 2048 codewords. If the 
number of voiced frames in the superframe is not larger than one, the 3-bit codeword is set to 
000 and the distinction between different modes is determined within the 9-bit codebook. Note 
that the latter case consists of the 4 modes UUU, VUU, UVU, and UUV (where U denotes an 
unvoiced frame and V a voiced frame and the three symbols indicate the voicing status of the 
ordered set of 3 frames in a superframe). In this case, the 9 available bits are more than 
sufficient to represent the mode information as well as the pitch value since there are 3 modes 
with 128 pitch values and one mode with no pitch value. 

3.4 Parity Bit 

To improve robustness to transmission errors, a parity check bit is computed and 
transmitted for the three mode bits (representing voicing patterns) in the superframe as defined 
above in Section 3.3. 

3.5 LSF Quantization 

The bit allocation for quantizing the line spectral frequencies (LSF's) is shown in Table 
5, with the original LSF vectors for the three frames denoted by l h l 2 , l 3 . For the UUU, UUV, 
UVU and VUU modes, the LSF vectors of unvoiced frames are quantized using a 9-bit 
codebook, while the LSF vector of the voiced frame is quantized with a 24 bit multistage VQ 
(MSVQ) quantizer based on the approach described in [8]. 

The LSF vectors for the other U/V patterns are encoded using the following forward- 
backward interpolation scheme. This scheme works as follows: The quantized LSF vector of 
the previous frame is denoted by l p . First the LSF's of the last frame in the current superframe, 
h, is directly quantized to / 3 using the 9-bit codebook for unvoiced frames or the 24 bit MSVQ 
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for voiced frames. Predicted values of l x and / 2 are then obtained by interpolating / and / 3 



using the following equations: 



lu) = «, o) • /, (») (/■)] • 4 0) 

40) = «2 U) • I U) + [l~a 2 (y)] • / 3 (y) 



y = l,...,10 



(4) 



where a x (j) and a 2 0) are the interpolation coefficients. 

The design of the MSVQ (multistage vector quantization) codebooks follows the 
procedure explained in [8]. 

The coefficients are stored in a codebook and the best coefficients are selected by 
minimizing the distortion measure: 



where the coefficients Wi(j) are the same as in the 2.4 kbps MELP standard. After obtaining the 
best interpolation coefficients, the residual LSF vector for frames 1 and 2 are computed by: 



r x V) = hU)-XU) 

r 2 U) = hU)-W) 7 = 1,...,10 

The 20-dimension residual vector R = [r 1 (l),r ] (2),...,r,(10),r 2 (l),r 2 (2) ) ...,r 2 (10)] is then 



quantized using weighted multi-stage vector quantization. 

3.6 Method for Designing the Interpolation Codebook 

The interpolation coefficients were obtained as follows. The optimal interpolation 
coefficients for each superframe were computed by minimizing the weighted mean square error 
between l\, l 2 and In, l i2 which can be shown to result in: 



e = 2>,C/)|4C/)-?0)| +£^(y)|/ 2 0")-40) 




(5) 
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«i (t) = 72 — —« — ti 

^U\h(J)-i P U)] 

(7) 

*hU) = r - - T 7 ' = 1 '-' 10 

^U)[l 3 U)-t P U)\ 

Each entry of the training database for the codebook design employs the 40-dimension vector 

(l p , l\, h, h )> and the training procedure described below. 

The database is denoted as L = {(/^,/^,^,/ 3fH ), n = 0,2,..., AT-l} , where (i p , n J hn J 2i J Xn ) = 

[/p^(l) / PpJt (10),i; <1B (l) ^(10),i^(lX...,/ lilt (10),^(l) / 3bW (10)] is a 40 dimension vector. 

The output codebook is C = {(a l m , a 2 m ), m = 0, . . . M - 1} , where (aj m , a 2 m ) = 

[fl liW (l),...,fl lfW (10),a 2>IB (l) > ... s a 2fW (10)] is a 20-dimension vector. 

3.6. 1 The two main procedures of the codebook training are now described. Given the 

codebook C = {(a IfM ,a 2f J, m = 0,...M'-l}, each database entry Zn= (i p>n J hn J 2 , n J Xn ) is 

associated to a particular centroid. The equation below is used to compute the error function 
between the entry (input vector) and each centroid in the codebook. The entry Z n is associated to 
the centroid which gives the smallest error. This step defines a partition on the input vectors. 



(8) 

10 

+ 

7=1 



10 r r 

S w 20o{^oo-[^c/)/ w (y)+(i-^o%,w)]}' 



3.6.2 Given a particular partition, the codebook is updated. Assume N' database 
entries are associated to the centroid A m = (a hm9 a 2 m ) , then the centroid is updated using the 
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following equation: 




(9) 



a 2 ,M = 




The interpolation coefficients codebook was trained and tested for several codebook sizes. A 
codebook with 16 entries was found to be quite efficient. The above procedure is readily 
understood by engineers familiar with the general concepts of vector quantization and codebook 
design as described in [7]. 

3.7 Gain Quantization 

In the 1 .2 kbps coder, two gain parameters are calculated per frame, with 6 gains per 
superframe. The 6 gain parameters are vector-quantized using a 10 bit vector quantizer with a 
MSE criterion defined in the logarithmic domain. 

3.8 Bandpass Voicing Quantization 

The voicing information for the lowest band out of the total of 5 bands is determined 
from the U/V decision. The voicing decisions of the remaining 4 bands are employed only for 
voiced frames. The binary voicing decisions (1 for voiced and 0 for unvoiced) of the 4 bands are 
quantized using the 2-bit codebook shown in Table 2. This procedure results in two bits being 
used for voicing in each voiced frame. The bit allocation required in different coding modes for 
bandpass voicing quantization is shown in Table 6. 

3.9 Quantization of Fourier Magnitudes 



SIG5116.01A 



30 



EL432997186US 



The Fourier magnitude vector is computed only for voiced frames. The quantization 
procedure for Fourier magnitudes is summarized in Table 7. The unquantized Fourier magnitude 
vectors for the three frames in a superframe are denoted as f n i = 1,2,3 . Denoted by f Q is the 

Fourier magnitude vector of the last frame in the previous superframe, /. denotes the quantized 
5 vector f. , and Q(.) denotes the quantizer function for the Fourier magnitude vector when using 
the same 8-bit codebook as used within the MELP standard. The quantized Fourier magnitude 
vectors for the three frames in a superframe are obtained as shown in Table 7. 
3.10 Aperiodic flag quantization 

The 1.2 kbps coder uses 1-bit per superframe for the quantization of the aperiodic flag. 
2 1 0 In the 2.4 kbps MELP standard, the aperiodic flag requires one bit per frame, which is three bits 
□ per superframe. The compression to one bit per superframe is obtained using the quantization 
O procedure shown in Table 8. In the table, "J" and indicate respectively the aperiodic flag 
^ states of set and not set. 
^ 3.11 Error Protection 

p1l5 3.1 1.1 Mode protection 

ypi Aside from the parity bit, additional mode error protection techniques are applied to 

superframes by employing the spare bits that are available in all superframes except the 
superframes in the VW mode. The 1 .2 kbps coder uses two bits for the quantization of the 
bandpass voicing for each voiced frame. Hence, in superframes that have one unvoiced frame, 
20 two bandpass voicing bits are spare and can be used for mode protection. In superframes that 
have two unvoiced frames, four bits can be used for mode protection. In addition 4 bits of LSF 
quantization are used for mode protection in the UUU and WU modes. Table 9 shows how 
these mode protection bits are used. Mode protection implies protection of the coding state, 
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which was described in Section 1.1. 

3.11.2 Forward Error Correction for UUU Superframe 

In the UUU mode, the first 8 MSB's of the gain index are divided into two groups of 4 
bits and each group is protected by the Hamming (8,4) code. The remaining 2 bits of the gain 
index are protected with the Hamming (7,4) code. Note that the Hamming (7,4) code corrects 
single bit-errors, while the (8,4) code corrects single bit errors and in addition detects double bit- 
errors. The LSF bits for each frame in the UUU superframes are protected by a cyclic 
redundancy check (CRC) with a CRC (13,9) code which detects single and double bit-errors. 
4. DECODER 

4.1 Bit Unpacking and Error Correction 

Within the decoder, the received bits are unpacked from the channel and assembled into 
parameter codewords. Since the decoding procedures for most parameters depend on the mode 
(the U/V pattern), the 12 bits allocated for pitch and U/V decisions are decoded first. For the bit 
pattern 000 in the 3-bit codebook, the 9-bit codeword specifies one of the UUU, UUV, UVU, 
and VUU modes. If the code of the 9-bit codebook is all-zeros, or has one bit set, the UUU 
mode is used. If the code has two bits set, or specifies an index unused for pitch, a frame erasure 
is indicated. 

After decoding the U/V pattern, the resulting mode information is checked using the 
parity bit and the mode protection bits. If an error is detected, a mode correction algorithm is 
performed. The algorithm attempts to correct the mode error using the parity bits and mode 
protection bits. In the case that an uncorrectable error is detected, different decoding methods 
are applied for each parameter according to the mode error patterns. In addition, if a parity error 
is found, a parameter-smoothing flag is set. The correction procedures are described in Table 10. 
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In the UUU mode, assuming no errors were detected in the mode information, the two 
(8,4) Hamming codes representing the gain parameters are decoded to correct single bit errors 
and detect double errors. If an uncorrectable error is detected, a frame erasure is indicated. 
Otherwise the (7,4) Hamming code for gain and the (13,9) CRC (cyclic redundancy check) codes 
5 for LSF's are decoded to correct single errors and detect single and double errors, respectively. 
If an error is found in the CRC (13,9) codes, the incorrect LSF's are replaced by repeating 
previous LSF's or interpolating between the neighboring correct LSF's. 

If a frame erasure is detected in the current superframe by the Hamming decoder, or an 
erasure is directly signaled from the channel, a frame repeat mechanism is implemented. All the 
1 0 parameters of the current superframe are replaced with the parameters from the last frame of the 
previous superframe. 

For a superframe in which an erasure is not detected, the remaining parameters are 
decoded. If smoothing is necessary, the post-smoothing parameter is obtained by: 

x = 05x + 05x' (10) 
1 5 where x and x' represent the decoded parameter of the current frame and the corresponding 
parameter of the previous frame, respectively. 
4.2 Pitch Decoding 

The pitch decoding is performed as shown in Table 4. For unvoiced frames, the pitch 
value is set to 50 samples. 
20 4.3 LSF Decoding 

The LSF's are decoded as described in Section 4.4 and Table 5. The LSF's are checked 
for ascending order and minimum separation. 
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4.4 Gain decoding 

The gain index is used to retrieve a codeword containing six gain parameters from the 10- 
bit VQ gain codebook. 

4*5 Decoding of Bandpass Voicing 
5 In the unvoiced frames, all of the bandpass voicing strengths are set to zero. In the 

voiced frames, Vbpi is set to 1 and the remaining voicing patterns are decoded as shown in 
Table 2. 

4.6 Decoding of Fourier Magnitudes 

The Fourier magnitudes of unvoiced frames are set equal to L For the last voiced frame 
1 0 of the current superframe, the Fourier magnitudes are decoded directly. The Fourier magnitudes 
of other voiced frames are generated by repetition or linear interpolation as shown in Table 7. 

4.7 Aperiodic Flag Decoding 

The aperiodic flags are obtained from the new flag as shown in Table 8. The jitter is set 
to 25% if the aperiodic flag is 1, otherwise the jitter is set to 0%. 

15 4.8 MELP Synthesis 

The basic structure of the decoder is the same as in the MELP standard except that a new 
harmonic synthesis method is introduced to generate the excitation signal for each pitch cycle. 
In the original 2.4 kbps MELP algorithm, the mixed excitation is generated as the sum of the 
filtered pulse and noise excitations. The pulse excitation is computed using an inverse discrete 

20 Fourier transform (IDFT) of one pitch period in length and the noise excitation is generated in 
the time domain. In the new harmonic synthesis algorithm, the mixed excitation is generated 
completely in the frequency domain and then an inverse discrete Fourier transform operation is 
performed to convert it into the time domain. This avoids the need for bandpass filtering of the 
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pulse and noise excitations, thereby reducing complexity of the decoder. 

In the new harmonic synthesis procedure, the excitation in the frequency domain is 
generated for each pitch cycle based on the cutoff frequency and the Fourier magnitude vector 
A l , / = 1,2, . . . , L . The cutoff frequency is obtained from the bandpass voicing parameters as 
5 previously described and it is then interpolated for each pitch cycle. The Fourier magnitudes are 
interpolated in the same way as in the MELP standard. 

With the pitch length denoted as N 9 the corresponding fundamental frequency is 
described by: f 0 = 2nj N . The Fourier magnitude vector length is then given by: L = N/2. Two 
transition frequencies F H and F L are determined from the cutoff frequency F employing an 
1 0 empirically derived algorithm, algorithm as follows, 



0.85F 


0Hz<F<500Hz 


1.05F 


0Hz<F<500Hz 


0.95F 


500/fe<F<1000#z 


1.05F 


500Hz <F<l000Hz 


< 0.98F 


lOOOflz < F < 2000/fe F L = < 


1.02F 


1000Hz<F<2000Hz 


0.95F 


2000Hz <F<300QHz 


1.05F 


2000Hz <F<3000Hz 


0.92F 


3000/fe<F<4000#z 


1.00F 


3000Hz £F<4000Hz 



These transition frequencies are equivalent to two frequency component indices V H and V L . A 
voiced model is used for all the frequency samples below V L , a mixed model is used for 
frequency samples between V L and Vh 9 and an unvoiced model is used for frequency samples 
1 5 above V H . To define the mixed mode, a gain factor g is selected with the value depending on the 
cutoff frequency (the higher the cutoff frequency F, the smaller the gain factor). 



1.0 


0Hz<F<500Hz 


0.9 


500Hz<F<l000Hz 


0.8 


lOOOHz<F<2000Hz 


0.75 


2000Hz <F<3000Hz 


0.7 


3000Hz <F<4000Hz 
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The magnitude and phase of the frequency components of the excitation are determined 
as follows: 



A, l< V L 

l-V V -I 

*..g.A i +-B—~A t V L <l<V H (11) 



v H -v L ' V H -V L 
Is- 4 i> v H 



l +*-yzy'+™M v L*i*y B (12) 

H L 



_U>W l > v H 

5 where / is an index identifying a particular frequency component of the IDFT frequency range 
and </> Q is a constant selected so as to avoid a pitch pulse at the pitch cycle boundary. The phase 
$bnd{I) is a uniformly distributed random number between -In and In independently 
generated for each value of /, 

In other words, the spectrum of the mixed excitation signal in each pitch period is 

1 0 modeled by considering three regions of the spectrum, as determined by the cutoff frequency, 
which determines a transition interval from F L toF H . In the low region, from 0 to F L9 the Fourier 
magnitudes directly determine the spectrum. In the high region, above F H , the Fourier 
magnitudes are scaled down by the gain factor g. In the transition region, from F L to F H , the 
Fourier magnitudes are scaled by a linearly decreasing weighting factor that drops from unity to 

1 5 g across the transition region. A linearly increasing phase is used for the low region, and random 
phases are used for the high region. In the transition region, the phase is the sum of the linear 
phase and a weighted random phase with the weight increasing linearly from 0 to 1 across the 
transition region. The frequency samples of the mixed excitation are then converted to the time 
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domain using an inverse Discrete Fourier Transform. 
5. TRANSCODER 
5.1 Concepts 

In some applications, it is important to allow interoperation between two different speech 
5 coding schemes. In particular, it is useful to allow interoperability between a 2400 bps MELP 
coder and a 1200 bps superframe coder. The general operation of a transcoder is illustrated in 
the block diagrams of Figures 5 A and 5B. In the up-converting transcoder 70 of Fig. 5 A, speech 
is input 72 to a 1200 bps vocoder 74 whose output is an encoded bit stream at 1200 bps 76 which 
is converted by the "Up-Transcoder" 78 into a 2400 bps bit stream 80 in a form allowing it to be 

1 0 decoded by a 2400 bps MELP decoder 82, that outputs synthesized speech 84. Conversely, in 
the down-converting transcoder 90 of FIG. 3B speech is input 92 to a 2400 bps MELP encoder 
94, which outputs a 2400 bps bit stream 96 into a "Down-Transcoder" 98, that converts the 
parametric data stream into a 1200 bps bit stream 100 that can be decoded by the 1200 bps 
decoder 102, that outputs synthesized speech 104. In full-duplex (two-way) voice 

1 5 communication both the up-transcoder and the down-transcoder are needed to provide 
interoperability. 

A simple way to implement an up-transcoder is to decode the 1200 bps bit stream with a 
1200 bps decoder to obtain a raw digital representation of the recovered speech signal which is 
then re-encoded with a 2400 bps encoder. Similarly, a simple method for implementing a down- 
20 transcoder is to decode the 2400 bps bit stream with a 2400 bps decoder to obtain a raw digital 
representation of the recovered speech signal which is then re-encoded with a 1200 bps encoder. 
This approach to implementing up and down transcoders, corresponds to what is called 
"tandem" encoding and has the disadvantages that the voice quality is substantially degraded and 
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the complexity of the transcoder is unnecessarily high. Transcoder efficiency is improved with 
the following method for transcoding that reduces complexity while avoiding much of the 
quality degradation associated with tandem encoding. 

5.2 Down-Transcoder 

5 In the down-transcoder, after synchronization and channel error correction decoding are 

performed, the bits representing each parameter are separately extracted from the bit stream for 
each of three consecutive frames (constituting a superframe) and the set of parameter 
information is stored in a parameter buffer. Each parameter set consists of the values of a given 
parameter for the three consecutive frames. The same methods used to quantize superframe 
% 1 0 parameters are applied here to each parameter set for recoding into the lower-rate bit stream. For 
q example, the pitch and U/V decision for each of 3 frames in a superframe is applied to the pitch 
O and u/V quantization scheme described in Section 3.2. In this case, the parameter set consists of 
03 3 pitch values each represented with 7 bits and 3 U/V decisions each given by 1 bit, giving a total 

of 24 bits. This is extracted from the 2400 bps bit stream and the recoding operation converts 
Lfi 1 5 this into 12 bits to represent the pitch and voicing for the superframe. In this way, the down- 
2 transcoder does not have to perform the MELP analysis functions and only performs the needed 
quantization operations for the superframe. Note that the parity check bit, synchronization bit, 
and error correction bits must be regenerated as part of the down transcoding operation. 

5.3 Up-Transcoder 

20 In the case of an up-transcoder the input bit stream of 1200 bps contains quantized 

parameters for each superframe. After synchronization and error correction decoding are 
performed, the up-transcoder extracts the bits representing each parameter for the superframe 
which are mapped (recoded) into a larger number of bits that specify separately the 
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corresponding values of that parameter for each of the three frames in the current superframe. 
The method of performing this mapping, which is parameter dependent, is described below. 
Once all parameters for a frame of the superframe have been determined, the sequence of bits 
representing three frames of speech are generated. From this data sequence, the 2400 bps bit 
5 stream is generated, after insertion of the synchronization bit, parity bit, and error correction 
encoding. 

The following is a description of the general approach to mapping (decoding) the 
parameter bits for a superframe into separate parameter bits for each of the three frames. 
Quantization tables and codebooks are used in the 1200 bps decoder for each parameter as 

1 0 described previously. The decoding operation takes a binary word that represents one or more 
parameters and outputs a value for each parameter, e.g. a particular LSF value or pitch value as 
stored in a codebook. The parameter values are requantized, i.e. applied as input to a new 
quantizing operation employing the quantization tables of the 2400 bps MELP coder. This 
requantization leads to a new binary word that represents the parameter values in a form suitable 

1 5 for decoding by the 2400 bps MELP decoder. 

As an example to illustrate the use of requantization, from the 1200 bps bit stream, the 
bits containing the pitch and voicing information for a particular superframe are extracted and 
decoded into 3 voicing (V/U) decisions and 3 pitch values for the 3 frames in the superframe; 
The 3 voicing decisions are binary and are directly usable as the voicing bits for the 2400 bps 

20 MELP bitstream (one bit for each of 3 frames). The 3 pitch values are requantized by applying 
each to the MELP pitch scalar quantizer obtaining a 7 bit word for each pitch value. Numerous 
alternative implementation of pitch requantization which follow the inventive method described 
can be designed by a person skilled in the art. 
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One specific alteration can be created by bypassing pitch requantization when only a 
single frame of the superframe is voiced, since in this case the pitch value for the voiced frame is 
already specified in quantized form consistent with the format of the MELP vocoder. Similarly, 
for the Fourier magnitudes, requantization is not needed for the last frame of a superframe since 
5 it is has already been scalar quantized in the MELP format. However the interpolated Fourier 
magnitudes for the other two frames of the superframe need to be requantized by the MELP 
quantization scheme. The jitter, or aperiodic flag, is simply obtained by table lookup using the 
last two columns of Table 8. 

6. DIGITAL VOCODER TERMINAL HARDWARE 

% 1 0 FIG. 6 shows a digital vocoder terminal containing an encoder and decoder that operate 

S in accordance with the voice coding methods and apparatus of this invention. The microphone 
q MIC 1 12 is an input speech transducer providing an analog output signal 114 which is sampled 
ffl and digitized by an Analog to Digital Converter (A/D) 116. The resulting sampled and digitized 
O speech 1 18 is digitally processed and compressed within a DSP/controller chip 120, by the voice 
LH 1 5 encoding operations performed in the Encode block 122, which is implemented in software 
/5j within the DSP/Controller according to the invention. 

The digital signal processor (DSP)120 is exemplified by the Texas Instruments 
TMC320C5416 integrated circuit, which contains random access memory (RAM) providing 
sufficient buffer space for storing speech data and intermediate data and parameters; the DSP 
20 circuit also contains read-only memory (ROM) for containing the program instructions, as 

previously described, to implement the vocoder operations. A DSP is well suited for performing 
the vocoder operations described in this invention. The resultant bitstream from the encoding 
operation 124 is a low rate bit-stream, Tx data stream. The Tx data 124 enters a Channel 
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Interface Unit 126 to be transmitted over a channel 128. 

On the receiving side, data from a channel 128 enters a Channel Interface Unit 126 which 
outputs an Rx bit-stream 130. The Rx data 130 is applied to a set of voice decoding operations 
within the decode block; the operations have been previously described. The resulting sampled 
5 and digitized speech 134, is applied to a Digital to Analog Converter (D/A) 136. The D/A 
outputs reconstructed analog speech 138. The reconstructed analog speech 138 is applied to a 
speaker 140, or other audio transducer which reproduces the reconstructed sound. 

FIG. 6 is a representation of one configuration of hardware on which the inventive 
principles may be practiced. The inventive principles may be practiced on various forms of 
1 0 vocoder implementations that can support the processing functions described herein for the 

encoding and decoding of the speech data. Specifically the following are but a few of the many 
variations included within the scope of the inventive implementation: 

(a) Using Channel Interface Units which contain a voiceband data modem for use 
when the transmission path is a conventional telephone line. 
1 5 (b) Using encrypted digital signals for transmission and described for reception via a 

suitable encryption device to provide secure transmission. In this case, the encryption unit would 
also be contained in the Channel Interface Unit. 

(c) Using a Channel Interface Unit that contains a radio frequency modulator and 
demodulator for wireless signal transmission by radio waves for cases in which the transmission 

20 channel is a wireless radio link. 

(d) Using a Channel Interface Unit that contains multiplexing and demultiplexing 
equipment for sharing a common transmission channel with multiple voice and/or data channels. 
In this case multiple Tx and Rx signals would be connected to the Channel Interface Unit. 
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(e) Employing discrete components, or a mix of discrete elements and processing 
elements, to replace the instruction processing operations of the DSP/Controller. Examples that 
could be employed include programmable gate arrays (PGAs). It must be noted that the 
invention can be fully reduced to practice in hardware, without the need of a processing element. 
5 Hardware to support the inventive principles need only support the data operations 

described. However, use of a DSP/processor chips are the most common circuits used for 
implementing speech coders or vocoders in the current state of the art. 

Although the description above contains many specificities, these should not be construed 
as limiting the scope of the invention but as merely providing illustrations of some of the 
10 presently preferred embodiments of this invention. Thus the scope of this invention should be 
determined by the appended claims and their legal equivalents. 



SIG5116.01A 



42 



EL432997186US 



Table 1. Bit Allocation of both 2.4 kbps and 1.2 kbps Coding Schemes 



Bits for quantization of three frames(540 samples) 



Parameters 


2.4 kbps 
Voiced 


2.4kbps 
Unvoiced 


1.2kbps 
state 1 


1.2kb 

Slate L 


1.2kb 

st&te 3 


1.2kb 

outlC H- 


1.2kbps 

stzite 5 


rltCn oc vjiODai U V 

Decisions 


7*3 


7*3 


12 


12 


12 


12 


12 


Parity 


0 


0 


1 


1 


1 


1 


1 


LSF's 


25*3 


25*3 


42 


42 


39 


42 


27 


Gains 


8*3 


8*3 


10 


10 


10 


10 


10 


Bandpass Voicing 


4*3 


0 


6 


4 


4 


2 


0 


Fourier Magnitudes 


8*3 


0 


8 


8 


8 


8 


0 


Jitter 


1*3 


0 


1 


1 


1 


1 


0 


Synchronization 


1*3 


1*3 


1 


1 


1 


1 


1 


Error Protection 


0 


13*3 


0 


2 


5 


4 


30 


Total 


162 


162 


81 


81 


81 


81 


81 



5 *Note: 1.2kbps State 1: All three frames are voiced. 

1.2kbps State 2: One of the first two frames is unvoiced, other frames are voiced. 
1.2kbps State 3: The 1 st and 2 nd frames are voiced. The 3 rd frame is unvoiced. 
1.2kbps State 4: One of the three frames is voiced, other two frames are unvoiced. 
1.2kbps State 5: All three frames are unvoiced. 
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Table 2. Bandpass voicing index mapping 



Codeword: 


AAAA 

0000 


1000 


1100 


1111 




uuuu 


1 C\(\C\ 
1UUU 


1 1 nn 


A1 1 1 

0111 


Voicing 
patterns 
assigned to 
the codeword. 


0001 
0010 
0011 

UIUU 

0101 
0110 


1001 
1010 




1011 
1101 
1110 

1111 

1111 




500 Hz 


1000 Hz 


2000 Hz 


4000 Hz 


Cutoff 










Frequency 
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Table 3. Pitch quantization schemes 



u/ v pattern 


r lien quantization meinou 


TT TT TT 

VJ vj U 


N/A 


u u V 


The pitch of the only voiced frame is scalar quantized using a 7- 
bit quantizer. 


u v u 


v u u 


U V V 


The pitches of the voiced frames are quantized using the same VQ 
as for the VW case. A weighting function is applied which takes 
into account the U/V information. 


V U V 


V V u 


V V V 


Vector quantization of three pitches 
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Table 4. Joint quantization scheme of pitch and voicing decisions 



Kjf v patterns 


^ Kit 


9-bit codebooks 


uuu 


000 


l ne pitcn value is quantized witn tne same y y-ievei 
uniioini qudniizer dis m inc z.*+KDpo sidiiuaru. i nc 

■mtf4i vain** nnH T T/V "nattPTn 5itp tVipn mji'nnpH tn <\ 

codevector in this 9-bit codebook. 


uuv 


uvu 


VTTTT 


WU 


001 


These U/V patterns share the same codebook 
containing 512 codevectors of the pitch triple. 


vuv 


010 


uw 


100 


vw 


011 


512-entry codebook A 


101 


512-entry codebook B 


110 


512-entry codebook C 


111 


512-entry codebook D 



O 5 
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Table 5. Bit allocation for LSF quantization according to UV decisions 



U/V pattern 


LSF/i 


LSF/ 2 


LSF/ 3 


Interpolati 
on 


Residual 
of /i and 

h 


Total 


U U U 

\-t \~f 


9 


Q 


Q 


ft 




Z / 


V u u 


8+6+5+5 


9 


9 


0 


0 


42 


U V u 


9 


8+6+5+5 


9 


0 


0 


42 


U U V 


9 


9 


8+6+5+5 


0 


0 


42 


U V V 

V U V 

V V V 


0 


0 


8+6+5+5 


4 


8+6 


42 


V V u 


0 


0 


9 


4 


8+6+6+6 


39 
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Table 6. Bit Allocation for bandpass voicing quantization 



LJV decisions pattern 


VW 


wu,vuv,uw 


vuu, uvu,uuv 


uuu 


Bits for bandpass 
voicing information 


6 


4 


2 


0 



SIG5116.01A 



48 



EL432997186US 



Table 7. Fourier magnitude vector quantization 



T TAT" not* £mti 

u/v partem 
for current 
superframe 


U/V decision for the last frame of the previous superframe 


U 


V 


uuu 


N/A 


vuu 


/,=0(/,) 


uvu 


/ 2 =0(/ 2 ) 


uuv 




uw 


f> = Q(f 3 ), f 2 =f 3 


vuv 




f 3 =Q(f 3 ), /,=/<> 


wu 


/ 2 =e(/ 2 ), 


f2=Q(f 2 X h = ^^ 


vw 


/ 2 =e(/ 2 x /,=/ 2 =/3 


h~ 3 ,h- 3 
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Table 8. Aperiodic flag quantization using 1 bit 



u/ v pattern 


Quantization Procedure 


Quantization Patterns 


New flag = 0 


New flag=l 


u u u 


N/A 


JJJ 


JJJ 


U U V 


It the voiced frame has aperiodic flag, 
set new flag. 


JJ- 


JJJ 


U V u 


J- J 


JJJ 


V u u 


-JJ 


JJJ 


U V V 


If the second frame has aperiodic flag, 
set new flag. 


J-- 


JJ- 


V V u 


--J 


-JJ 


V U V 


N/A 


-J- 


-J- 


V V V 


If > 1 frame has the aperiodic flag set, 
set new flag. 




JJJ 
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Table 9. Mode protection schemes 



U/V pattern 


i-b codebook or 
joint quantization 
ior pitcn ana u/ v 
uccisions 


Bit pattern of 
bandpass 
voicing l 


Bit pattern of 
bandpass 
voicing 2 


Bit pattern of 
LSF 


U U U 


000 


00 


\J\J 




U U V 


00 


01 




U V u 


00 


10 




V u u 


00 


ll 




V V u 


001 


01 




0101 


V U V 


010 


10 






U V V 


100 


ll 






V V V 


Oil, 101,110, 111 
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Table 10. Parameter decoding schemes if a mode error is detected 



u/v 

pattern 


Corrected 

U/V 

pattern 


LSF's 


Gain 


Pitch 


Bandpass 
voicing 


Fourier 
Magnitude 


UUU 
UUV 

T T\ 7T T 

uvu 
vuu 


UUU 


Repeat LSF's 
of the last 

lldlllc 111 LXie 

previous 
superframe 


Decode 
cuiu oppiy 
smoothing 




oet to u 


Set to 1 all 
magnitudes 


wu 
vuv 
wu 


VW 


Decode and 

apply 

smoothing 


Decode 
and apply 
smoothing 


Decode 
and apply 
smoothing 


Set the 
first band 
to 1, 
others to 
0 
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CLAIMS 

What is claimed is: 

1 . A vocoder apparatus, comprising: 

(a) a superframe buffer for receiving multiple frames of voice data; 

(b) a frame-based voice encoder analysis module for extracting parametric voice data 
from each frame within the superframe buffer; 

(c) a superframe encoder for receiving parametric voice data for a series of frames 
within the superframe buffer from the analysis module, wherein parametric voice data received 
from the analysis module is selectively quantized to produce voice data which is encoded into an 
outgoing digital bit stream for transmission; 

(d) a superframe decoder for receiving and decoding a digital bit stream encoded with 
superframe voice data into quantized frame-based parameters; and 

(e) a frame-based decoder synthesizer for receiving the quantized parameters for each 
frame and decoding the quantized parameters into a synthesized voice output. 

2. A voice compression apparatus, comprising: 

(a) a superframe buffer for receiving multiple frames of voice data; 

(b) a frame-based encoder analysis module for analyzing characteristics of voice data 
within frames contained in the superframe to produce an associated set of voice data parameters; 
and 

(c) a superframe encoder for receiving voice data parameters from the analysis 
module for a group of frames contained within the superframe buffer, for reducing by analysis 
data for the group of frames and for quantizing and encoding said data into an outgoing digital 
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bit stream for transmission. 



3. A voice compression apparatus as recited in claim 2, wherein the analysis module 
is capable of receiving voice data parameters is selected from the group of voice encoders 
consisting of linear predictive coders, mixed-excitation linear prediction coders, harmonic 
coders, and multiband excitation coders. 

4. A voice compression apparatus as recited in claim 2, wherein said superframe 
encoder includes at least two parametric processing modules selected from the group of 
parametric processing modules consisting of pitch smoothers, bandpass voicing smoothers, linear 
predictive quantizers, jitter quantizers, and Fourier magnitude quantizers. 

5. A voice compression apparatus as recited in claim 2, wherein said superframe 
encoder includes a vector quantizer wherein pitch values within a superframe are vector 
quantized with a distortion measure responsive to pitch errors. 

6. A voice compression apparatus as recited in claim 2, wherein said superframe 
encoder includes a vector quantizer wherein pitch values within a superframe are vector 
quantized with a distortion measure responsive to pitch differentials as well as pitch errors. 

7. A voice compression apparatus as recited in claim 2, wherein said superframe 
encoder includes a quantizer of linear prediction parameters, wherein quantization is performed 
with a codebook-based interpolation of linear prediction parameters that employ different 
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interpolation coefficients for each linear prediction parameter, and wherein said quantizer 
operates in closed loop mode to minimize overall error over a number of frames 

8. A voice compression apparatus as recited in claim 7, wherein said quantizer is 
5 capable of performing a line spectral frequency (LSF) quantization using said codebook-based 

interpolation. 

9. A voice compression apparatus as recited in claim 8, wherein said codebook is 
created by means of a training database operated on by a centroid-based training procedure. 

10 

10. A voice compression apparatus as recited in claim 2, wherein said superfirame 
encoder includes a pitch smoother wherein calculations are based on an onset/offset classifier. 

11. A voice compression apparatus as recited in claim 2, wherein said superframe 
15 encoder includes a pitch smoother wherein pitch trajectory is calculated using a plurality of 

voicing decisions. 

12. A voice compression apparatus as recited in claim 11, wherein said pitch 
smoother classifies frames into onset and offset frames based on at least four waveform feature 

20 parameters selected from the group of waveform feature parameters consisting of energy, zero- 
crossing rate, peakiness, maximum correlation coefficient of input speech, maximum correlation 
coefficient of 500 Hz low pass filtered speech, energy of low pass filtered speech, and energy of 
high pass filtered speech. 
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13. A voice compression apparatus as recited in claim 2, wherein said superframe 
encoder includes a bandpass voicing smoother for mapping multiband voicing decisions for each 
frame into a single cutoff frequency for that frame, wherein said cutoff frequency takes on one 
value from a predetermined list of allowable values. 

14. A voice compression apparatus as recited in claim 13, wherein said bandpass 
voicing smoother performs smoothing by modifying the cutoff frequency of a frame as a 
function of the cutoff frequencies of neighboring frames and the average frame energy. 

15. A voice compression apparatus as recited in claim 2, further comprising means 
for compressing aperiodic flag bits for each frame in a superframe into a single bit per 
superframe, which bit is created based on the distribution of voiced and unvoiced frames within 
the superframe. 

16. A voice compression apparatus as recited in claim 2, wherein said superframe 
encoder includes a plurality of quantizers for encoding parametric data into a set of bits, wherein 
at least one of said quantizers employs vector quantization to represent interpolation coefficients. 

17. A voice compression apparatus as recited in claim 2, wherein a superframe is 
categorized into one of a plurality of coding states based on the combination of voiced and 
unvoiced frames within the superframe, and wherein each of said coding states is associated with 
a different bit allocation to be used with the superframe. 
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18. A voice compression apparatus, comprising: 

(a) a superframe buffer for receiving multiple frames of voice data; 

(b) a frame-based analysis module for determining a set of voice data parameters for 
said voice data; and 

(c) a superframe encoder for receiving unquantized voice data parameters for groups 
of frames within a superframes, said superframe encoder comprising 

(i) a pitch smoother for determining pitch and U/V decisions for each frame 
of the superframe and extracts parameters needed for frame classification into onset and 
offset frames, 

(ii) a bandpass voicing smoother for determining bandpass voicing strengths 
for the frames within the superframe and determines cutoff frequencies for each frame, 
and 

(iii) a parameter quantizer and encoder for quantizing and encoding voicing 
parameters received from said analysis module, said pitch smoother, and said bandpass 
voicing smoother into a set of bits and encoding said bits into an outgoing digital bit 
stream for transmission. 

19. A voice decoder apparatus, comprising: 

(a) a superframe decoder for receiving an incoming digital bit stream as a series of 
superframes and decoding and inverse quantizing said superframes into quantized frame-based 
voice parameters; and 

(b) a frame-based decoder for receiving said quantized frame-based voice parameters 
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and combining said quantized frame-based voice parameters into a synthesized voice output 
signal. 

20. A method of decoding a parametric voice encoded data stream into an audio voice 
signal comprising the steps of: 

(a) buffering a received parametric voice data stream having a plurality of pitch 
periods and loading said buffered frame data into a buffer; 

(b) constructing an estimated spectrum of excitation within each pitch period by 
breaking down the frequency spectrum into regions based on cutoff frequency, wherein said 
construction comprises the steps of: 

(i) computing Fourier magnitude for each region, wherein the resultant 
computed Fourier magnitudes for at least one of said regions is then scaled by a gain 
factor computed for that region, 

(ii) computing phase within each region, wherein the resultant phase for at 
least one of said regions has been modified by use of a weighted random phase, and 

(iii) converting said Fourier magnitude and said phase within each region to a 
time domain representation by the computation of an inverse discrete Fourier transform; 
and 

(c) generating an analog voice signal from said time domain representation. 

21. A method as recited in claim 20, wherein said regions through which the 
frequency spectrum is broken down into comprise: 

(a) a lower region wherein Fourier magnitudes directly determine the spectrum; 
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(b) a transition region wherein Fourier magnitudes are scaled down by a linearly 
decreasing weighting factor that drops from unity to a nonzero positive value dependent on the 
cutoff frequency of the current frame; and 

(c) an upper region wherein Fourier magnitudes are scaled down by a weighting 
factor dependent on the cutoff frequency of the current frame. 

22. An up-transcoder apparatus which receives a superframe encoded voice data 
stream and converts it to a frame-based encoded voice data stream, comprising: 

(a) a superframe buffer for collecting superframe data and extracting bits representing 
superframe parameters; 

(b) a decoder for inverse quantizing the bits for each set of superframe parameters 
into a set of quantized parameter values for each frame of the superframe; and 

(c) a frame-based encoder for quantizing the voice parameters for each of the 
underlying frames, mapping said quantized voice parameters into frame-based data, and 
producing a frame-based voiced data stream. 

23. A down-transcoder apparatus which receives an encoded frame-based voice data 
stream and converts it into a superframe-based encoded voice data stream, comprising: 

(a) a superframe buffer for collecting a number of frames of parametric voice data 
and extracting bits representing frame-based voice parameters; 

(b) a decoder for inverse quantizing the bits for each frame of parameter into 
quantized parameter values for each frame; and 

(c) a superframe encoder for collecting said quantized frame-based parameters for the 
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group of frames within the superframe, producing a set of parametric voice data, and quantizing 
and encoding said parametric voice data into an outgoing digital bit stream. 



24. A vocoder method for encoding digitized voice into parametric voice data, 
5 comprising the steps of: 

(a) loading multiple frames of digitized voice into a superframe buffer; 

(b) encoding digitized voice within each frame of the superframe buffer by 
parametric analysis to produce frame-based parametric voice data; 

(c) classifying frames as onset frames and offset frames by calculating pitch and U/V 
1 0 parameters within each frame of the superframe; 

(d) determining a cutoff frequency for each frame within the superframe by 
calculating a bandpass voicing strength parameter for the frames within the superframe buffer; 

(e) collecting a set of superframe parameters from the parametric analysis, frame 
classification, and cutoff frequency determination steps for the group of frames within the 

15 superframe; 

(f) quantizing the superframe parameters into discrete values represented by a 
reduced set of data bits that form quantized superframe parameter data; and 

(g) encoding quantized superframe parameter data into a data stream of superframe- 
based parametric voice data that contains substantially equivalent voice information to the frame- 

20 based parametric voice data, yet at a lower bit per second rate of encoded voice. 
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25. A vocoder method for producing digitized voice from superframe-based 
parametric voice data, comprising the steps of: 

(a) receiving superframe-based parametric voice data in a superframe buffer; 

(b) decoding and inverse quantizing the voice data within the superframe buffer to 
recreate a set of frame-based voice parameter values; and 

(c) decoding the frame-based voice parameters with a frame-based voice synthesizer 
which decodes the frame-based voice parameters to produce a digitized voice output. 
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ABSTRACT OF THE DISCLOSURE 
An enhanced low-bit rate parametric voice coder that groups a number of frames from an 
underlying frame-based vocoder, such as MELP, into a superframe structure. Parameters are 
extracted from the group of underlying frames and quantized into the superframe which allows 
5 the bit rate of the underlying coding to be reduced without increasing the distortion. The speech 
data coded in the superframe structure can then be directly synthesized to speech or may be 
transcoded to a format so that an underlying frame-based vocoder performs the synthesis. The 
superframe structure includes additional error detection and correction data to reduce the 
distortion caused by the communication of bit errors. 
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DRAWING(S) 
There is attached five (5) sheets of drawings. 
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EXECUTED OATH OR DECLARATION 
An executed declaration shall follow. 
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SEQUENCE LISTING 



Not Applicable 
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